-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gc: implement basic GC for Git backend #2659
Conversation
8816e37
to
0e4e81b
Compare
Non-blocking: Perhaps we should also run |
0e4e81b
to
ddb8c74
Compare
|
ddb8c74
to
5f4e36a
Compare
Good point, apparently there is a OTOH, I am only distantly familiar with this (reading manpages) and am quite possibly confused, but AFAICT there's still an issue. We want to run Fortunately, if you don't already know the answer to this, there are plenty of friendly people we could ask... |
I agree that it sound like
|
Nice! (if a bit strange) I'm out of my depth now; if you don't, I might ask on some Git forum at some point. In any case, |
FWIW, packing |
After some more reading and testing, it seems clear that
I hope to fix that soon by replacing lots of refs by a single refs to a merge commit. |
OT, but I'm not sure if that's better because the dummy merge would be visible in |
Good question. I wonder if people with colocated repos use |
It does contain lots of empty working-copy heads, but the topology is correct. It would be messier if all orphaned heads are merged. No idea if people use |
5f4e36a
to
b52c641
Compare
I asked for suggestions in Git's Discord server. @dscho suggested using reflogs. That seems like a great idea to. Then we can create a merge commit with thousands of parents, point a |
b52c641
to
e3e84de
Compare
This adds an initial `jj util gc` command, which simply calls `git gc` when using the Git backend. That should already be useful in non-colocated repos because it's not obvious how to GC (repack) such repos. In my own jj repo, it shrunk `.jj/repo/store/` from 2.4 GiB to 780 MiB, and `jj log --ignore-working-copy` was sped up from 157 ms to 86 ms. I haven't added any tests because the functionality depends on having `git` binary on the PATH, which we don't yet depend on anywhere else. I think we'll still be able to test much of the future parts of garbage collection without a `git` binary because the interesting parts are about manipulating the Git repo before calling `git gc` on it.
e3e84de
to
07a38d5
Compare
Interesting. I didn't know that reflog also prevents GC. That will hide anonymous branches from |
I haven't found that discussion yet, but does Git have a setting on the amount of reflog history garbage collection preserves? Could this setting be set to 0 sometimes (so GC only preserves the currently existing branch positions)? If Git preserved all of the reflog from GC, it would seem like a GC wouldn't be able to do much a lot of the time, but I might be missing something. |
Reflogs don't need extra merge commits to be created: just add them individually to the reflog. The reflog's expiry is time-based, read: you may want to "refresh" the reflog. (I don't know whether the hack to add "future" reflog entries would upset Git or not.) |
Yes, that is my biggest concern with the reflog approach. We'd effectively have to tell users that they're not allowed to run
Sorry, the context that I didn't share here is that I'm thinking of replacing refs pointing to thousands of heads by a single ref pointing to a merge commit with thousands of parents. The main reason for that is to make fetching from a remote faster (#293). |
This adds an initial
jj util gc
command, which simply callsgit gc
when using the Git backend. That should already be useful in non-colocated repos because it's not obvious how to GC (repack) such repos. In my own jj repo, it shrunk.jj/repo/store/
from 2.4 GiB to 780 MiB, andjj log --ignore-working-copy
was sped up from 157 ms to 86 ms.I haven't added any tests because the functionality depends on having
git
binary on the PATH, which we don't yet depend on anywhere else. I think we'll still be able to test much of the future parts of garbage collection without agit
binary because the interesting parts are about manipulating the Git repo before callinggit gc
on it.Checklist
If applicable:
CHANGELOG.md